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ABSTRACT 

The use of multivariate statistics in the social and behavioral sciences is becoming more 
and more widespread. One multivariate technique that is commonly used is discriminant 
function analysis. The present paper will compare and ‘contrasts the two purposes of 
discriminant analysis, prediction and description. Using a heuristic data set, a conceptual 
explanation of both techniques is provided with emphasis on which aspects of the computer 
printouts are essential for the interpretation of each type of discriminant analysis. 
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To honor a reality in which we believe that any given effect can have one or many causes 
and in which any given cause could have one or multiple effects, it is vital for the researcher to 
understand the application of multivariate statistics (Thompson, 1986). Dolenz ( 1993) reported 
that even though this is becoming more widely accepted in research, many graduate programs in 
the sdciarsciences carfy'sfatistics courses that focus on univWiate”aharysiTand culminate only 
with a detailed look at analysis of variance (ANOVA). Empirical studies of present practice also 
indicate that univariate analysis, and particularly ANOVAs, are still the predominant statistical 
method that is chosen in the beha\'ioral sciences (Elmore & Woehlke. 1988: Goodwin & 
Goodwin. 1985.) 

Studies in the social sciences comparing two or more groups very often measure subjects 
on several dependent variables (Stevens, 1993). Statistical techniques whicn examine two or 
more dependent variables simultaneously are referred to as multivariate. For example, a 
researcher may want to investigate the impact of four teaching techniques (Methods A, B, C, and 
D) upon the four subjects (dependent variables) of reading comprehension, arithmetic, spelling 
and problem solving. After randomly assigning the students to one of the four classes, each 
subject area is measured using an intervally scaled instrument. 

A graduate student who has just finished a course in ANOVA, may be tempted to analyze 
the above data by doing four one way ANOVAs, one ANOVA for each dependent variable. If 
statistical significance is noted, this student would then do post hoc tests for each statistically 
significant ANOVA. Fish ( 1986) noted two reasons why this is undesirable. First, doing four 
different ANOVAs inflates the possibility of a Type 1 "experimentwise" error. Thompson 
( 1 994) reports that most researchers arc familiar with “testwise alpha" or the probability of 
making a Type 1 error for a given hypothesis. However, little attention is given to the probability 



Discriminant Analysis 4 



of making a Type 1 error anywhere in the study, i.e., the "experimentwise” error rate. The 
“experimentwise” error for four one way ANOVAs is conceptually about 4 times the testwise 
alpha level (aTw= 05) or approximately 20% for perfectly uncorrelated dependent variables. 

If the dependent variables in the above example are in fact perfectly uncorrelated the 
"''‘Bonferroni inequality" would be the more precise way of calculating the “experimehfwfse" 
error. Applying the ''Bonferroni inequality" to perfectly uncorrelated variables, the chances of 
making a Type 1 error (arw~ 05) somewhere in our experiment would be approximately 1 8.55% 
(Thompson. 1994). 

c*i;w “ 1 - ( 1 - 

l-(l-.05)^ 

l-(.95)-^ 

1-(.8145) 

anw =.1855 

Researchers can control this "experimentwise" error by using the "Bonferroni correction" 
(Thompson, 1994). The "Bonferroni correction" involves the calculation of a new testwise alpha 
level, computed by dividing the testwise alpha by the number of hypotheses. However, this 
lowered alpha level could lead to less statistical power or Type II error. Fish ( 1988) and Maxwell 
(1992) have both provided data sets which illustrate the paradoxical effect of failing to identify 
statistically significant results when univariate tests are used inappropriately when multivariate 
tests should have been employed. 

Thompson ( 1994) noted that "the use of the ‘Bonferroni correction' does not address the 
second (and more important) reason why multivariate methods are so often vital, and so even 
with this correction univariate methods usually still remain unsatisfactory" (p. 12). This "more 
important reason" that Thompson ( 1994) refers to is the second reason reported by Fish (1988), 
i.e., the use of several univariate tests does not have the ability to reflect the reality which we 
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believe exists. However, multivariate methods have the ability to reflect the reality of the data 
from which the researcher is working. Just as independent variables can interact to produce 
statistically significant results, so too can dependent variables interact to produce statistically 
.igniflcant results (Thompson, 1994). This interaction of dependent variables can be detected by 
the use of niulti variate” techniques. The use of multivariate techniques can take into account the 
intercorrrelations of the independent and dependent variables. Whatever the case, multivariate 
statistics can take into consideration these interactions and intercorrelations (Thompson, 1994). 

In the present paper, the multivariate technique that will be focused upon is discriminant 
function analysis. Specifically, the paper will compare and contrast descriptive discriminant 
function analysis (DDA)and predictive discriminant function analysis (PDA). A data set will be 
used to explain and illustrate the similarities and differences of these two techniques. While the 
data used in the paper are real data from another research project, the research question has been 
changed in this paper for ease of explanation. This fictional research questions used to illustrate 
DDA and PDA was referred to above. Does teaching method A, B, C, or D affect performance 
in reading comprehension, arithmetic, spelling and'or problem solving? 

Overview 

Initially, discriminant analysis was designed to predict group membership, given a 
number of continuous variables (Dolenz. 1993). For example, if incumbent candidates were 
running for office and wanted to predict • vhether or not they were going to be re-elected, they 
could gather information on previous incumbent candidates and whether or not they were 
elected. To predict their re-election the candidate may choose variables such as the condition of 
the economy, number of foreign crises, tax rates, and any other variables that may be important 
to predict re-election. From a previous sample of senators, a linear discriminant function (LDF) 
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can be derived such that a new individual can be placed into one of the categories of re-elected 
or not re-elected (Huberty, 1975), and any senator could predict his or her own individual 
chances. 

The second purpose of discriminant analysis is to study and explain group separation or 
group differences ( Hi6erp;''& Wisenbalcerrl992). The use of'DDA technique”s"fO describe group 
differences began to be used in the 1960's (Huberty, 1975). Traditionally, DDA techniques have 
been used as a follow-up to a multivariate analysis of variance (MANOVA) (Huberty & Morris, 
1989). In DDA. a set of weights are obtained and a linear combination of a set of response 
variables is computed to maximize between-group separation while minimizing within-group 
variance (Klecka, 1980). This minimization of within-group variance and the maximization of 
between-group variance by the use of a set of weights is also employed in ANOVA, Multiple 
Regression and t-Tests (Thompson, 1991 ). 

Discriminant analysis basically consists of a set of inteiwally scui..J variables and a set of 
grouping or categorical variables. To determine which set of variables is the predictor variables 
and which set is the criterion variables, the research question is required. Each research 
situation determines the direction of causation and thus whether or not PDA or DDA is to be 
used (Klecka. 1980). If group membership is being used to predict or e.xplain scores on the 
continuous variables, DDA is used. If the scores on the continuous variables are used to predict 
group membership, PDA is used. In a DDA the group variables are treated as independent 
variables while the dependent variables are the continuous variables. In the example given 
above, the independent variables are the teaching techniques while the dependent variables are 
the scores in the four subject areas. If we were trying to predict which students respond better to 
each of the four teaching techniques we could use the scores on the four tests to predict class 
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membership. In this PDA, the dependent variables are group membership and the independent 
variables are the interval scores on the four tests. 

Assumptions ofDDA and PDA 

Klecka ( 1 980) described seven mathematical assumptions of discriminant analysis. In 
bfdef fbf a discriminanf ariaTysis to be cohducfed, the f6riowing~seVen assurnpirons rnliirbe met: 

1) two or more groups which are mutually exclusive; 

2) at least tvvo subjects per group; 

3) any number of discriminating (continuous) variables can be used provided that the 
number of cases exceeds the number of variables by more than two: 

4) discriminating variables are measured at the interval level; 

5) no discriminating variable may be a linear combination of other discriminating 
variables; 

6) the covariance matrices for each group must be (approximately) equal, unless other 
special formulas are used; 

7) each group has been drawn form a population with a multivariate normal distribution 
on the discriminating variables. 

Interpretation ofDDA Results 

When interpreting the results of a DDA three questions drive our analysis of the results. 
First, do the groups differ? Second, which groups differ? Third, if they do differ, on which 
dependent variables do they differ? Historically, a MANOVA would be run and if statistically 
significant results were found, a DDA would be run as a post hoc test. The primary run of a one- 
way MANOVA program prior to a DISCRIMINANT program is unnecessary, however, given 
that a one-way MANOVA and discriminant analysis are the same thing (Huberty & Wisenbaker, 
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1992). In fact, the SPSS MANOVA and DISCRIMINANT c mmands yield essentially the same 
information on the computer printouts (Dolenz, 1993). Interested readers are encouraged to 
“prove" this for themselves by running the SPSS syntax file presented in Appendix 1. In 
discriminant analysis, statistics reported which are of interest and will be discussed in the present 
paper include canonical correlations, eigenvalues, and Wilks lambda, as well as standardized” ^ 
coefficients, structure coefficients, and an evaluation of group centroids (Dolenz, 1993). 

Before looking at the results, and addressing the three questions, it is first important to 
consider whether the basic assumptions of discriminant analysis have been met. Using a 
DISCRIMINANT program, it is possible to test the assumptions associated with discriminant 
analysis (Huberty & Barton, 1989). Univariate homogeneity of variance is tested in SPSS using 
Cochran’s test of homogeneity of variance and 6artlett-Box F. The results for our data suggest 
that there is no statistically significant difference in the variances of the dependent variables 
across the four teaching techniques. 

Insert Table 1 About Here 

Stevens (1992) reports that except for rare examples, multivariate normality can be 
detected by methods assessing for univariate normality. However, caution is advised; since 
univariate normality is a necessary but not a sufficient condition for multivariate normality we 
cannot conclude definitively that we have multivariate normality even if we do have univariate 
and bivariate normality. However, if there was a statistically significant and noteworthy 
difference in the univariate normality, we could not proceed any further. 

The second assumption that is tested is the homogeneity of the variance/covariance 
matrices for each dependent variable across the four groups. SPSS uses Boxes M as the test for 
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homogeneity of the variance/covariance matrices. Included in Table 2, which has been taken 
directly from the computer print-out, are the variance/covariance matrices for each group and the 
pooled variance/covariance matri.x as well as an F test for homogeneity of variance/covariance. 
Since the F statistic was not statistically significant and the test is very' powerful, we can 
conclude thaf the assumption that the rhatrices be approximately equal has been met (Klecka, 
1980). Since there was no statistically significant difference in the variance/covariance matrices 
for our data, w-e can proceed to answering our three questions. 

Insert Table 2 About Here 

Our first question can be answered by inspecting the omnibus null hypothesis or the 
multivariate test of statistical significance. The omnibus null for our data refers to the question, 
do the different teaching techniques produce differences on the variables of arithmetic, reading 
comprehension, spelling andy'or problem solving? For our data, Wilks' multivariate test of 
significance will be used, although there are three other methods are also used to calculate 
statistical significance for a MANOVA (Heausler. 1987). One-way MANOVA and 
DISCRIMINANT results across the different teaching techniques indicated a statistically 
significant difference in our data [F=2.346 (12,455.36), p=.006] as shown on Table 3. The 
computer printout also reports univariate F-ratios for the four dependent variables. 

Insert Table 3 About Here 

The second and third questions to be answered refer to which groups differ and on which 
dependent variables do they differ. We can answer these question by examining the discriminant 
functions. Before proceeding with these questions, it is important to understand what a 
discriminant function is and how many discriminant functions are possible. Discriminant 
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function scores are a linear combination of the discriminating variables (intervally scaled 
variables) which are formed to satisfy certain conditions: the discriminant function is the set of 
weights applied to the response variables to compute these discriminant function scores. The 
first condition is that the discriminant functions are derived in order to maximize the separation 
of the group's (befvveeh-group variance) while minimizing the dispersion of scores within each ” 
group (within-group variance) (Huberty, 1984). 

The number of discriminating functions derived in discriminant analysis is based on the 
number of groups and the number of discriminating variables. The number of functions equals 
the number of groups minus one or the number of discriminating variables, whichever is smaller 
(Huberty, 1975). The coefficients that compose the first function are derived to maximize the 
differences between the groups. The coefficients for the second function are also derived to 
maximize the dispersion of the groups with the added condition that the values on the second 
function are not correlated with values on the first function (Klecka, 1980). The third function is 
derived in a way which maximizes group differences without being correlated with the first or 
second functions. This process continues up to the number of unique functions which can 
possibly be derived, with some of the latter functions being trivial and lacking statistical 
significance ( Dolenz, 1993). 

Since statistical significance is largely an artifact of sample size (Cohen, 1994), other 
means of evaluating whether or not a researcher has found meaningful results have been 
suggested. Effect size has been suggested as an alternative to statistical significance or to be used 
along with statistical significance (Cohen, 1994). One effect size statistic derived from 
discriminate function analysis is the canonical correlation coefficient, a measure of association 
between the groups and the discriminant function (Klecka, 1980). By squaring the canonical 
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correlation coelticient, a statistic analogous to eta" is derived. In the example presented above, 
the first canonical correlation is .3706, making eta" equal to .1373 or 13.73%. The researcher 
could then conclude that a noteworthy amount of variance in scores on the discriminating 
variables is predictable form group membership. 

Insert Table 4 about here 

The most common test for statistical significance is based on Wilks’ lambda (Klecka, 
1980). Wilks' lambda is also an “inverse" measure, analogous to 1-etaT with a maximum of one 
and a minimum of zero. An effect size for a DDA can be calculated by subtracting the value of 
Wilks lambda from 1. In tables 3 and 4 above, Wilks' lambda is reported as .85325. Therefore, 
effeci size could also be calculated by 1 - .85325 making the effect size equal to .14675 or 
14.675%. 

Another statistic that is reported in discriminant analysis and can be seen in Table 4 is an 
eigenvalue. Although eigenvalues cannot be interpreted directly, the relative magnitude of the 
eigenvalues can be used to describe the relative value of each function (Klecka, 1980). The 
function with the largest eigenvalue is the largest discriminator, and the fu> ctions with the 
smaller eigenvalues are the least powerful at discriminating the groups. In Table 4, Function 1 
has an eigenvalue of .159 and Function 2 has an eigenvalue of .01 1. From these two 
eigenvalues, we can conclude Function 1 discriminates 14 times better than Function 2. 

Now that we have concluded that there is a statistically significant and meaningful 
difference in our four teaching methods, and that these differences lie only in Function 1, we 
need to turn our attention to the question, which groups differ? By looking at Table 5, and 
examining the canonical discriminant functions evaluated at the group centroids, we can see the 
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group 1,2, and 3 are approximately at the same points on Function 1 and group 4 is a 
considerable distance from groups 1 , 2 and 3. We can therelore conclude that group 4 members 
are effected most by that teaching method. 

Insert Table 5 about here 

Now that we know that Function 1 discriminates group 4 from groups 1 , 2, and 3, wc 
need to ascertain what variables compose function one. This is done by examining the 
standardized canonical discriminant function coefficients and the structure matrix of each 
function. The standardized coefficient gives that variable's relative unique contribution to 
calculating the discriminant score Klecka, 1980), Since standardized coefficients are 
conceptually analogous to beta weights in regression, they cannot be interpreted alone. 
Standardized coefficients are derived with the relative contribution of all variables being 
considered simultaneously (Thompson, 1992). Dolehz ( 1993) writes, 

A problem with standardized coefficients arises when variables have high 
intercorrelations- causing the intercorrelating \ ariabLs to "compete" for 
weighted values. Conceptually, a variable that would carry a high weight if 
considered alone may be "blocked" by a variable sharing the same 
discriminating information. Interpretation of this blocked \ ariable's standardized 
coefficient would cause the erroneous conclusion that it was not an important 
contributing variable, (pp. 11-12) 

While standardized coefficients consider all variable contributions to the function 
simultaneously, structure coefficients are bivariate correlations and therefore, are not affected by 
relationships with other variables (Klecka. 1980). Structure coefficients explain which variables 
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combine to compose the function. Structure coefficients can range from -1.0 to +1.0 smce they 
are simple correlations. B\’ noting those variables which make up the largest portion of the 
function, we can attempt to name the function (Klecka, 1980). 

Insert Table 6 about here 

By e.xamining the structure matrix in Table 6 we can see that READING COMP 
correlates .88 with Function 1 and SPELLING correlates .68 with the Function 1. It is the 
rcsponsibilitv of the researchers to rely on their own creati\ it>’ and their knowledge of the 
literature to name and describe each function. Since Function 1 is composed mainly of reading 
comprehension and spelling, it could be concludeu that teaching method D intluences .score in 
reading comprehension and spelling, i.e . '’cerbar areas. 

Interpretation of Results of PDA 

As stated earlier, the original purpose of discriminant analysis was the prediction of 
group membership (Huberty & Wisenbaker, 1992) The focus in this analysis changes from the 
description of the mnuences of group membership on the scores on inter\ ally-scaled variables to 
a focus on group classification accuraev or the percentage of cases correctlv classified based on 
using intervally-scaled scores as predictor variables. 1 low then do we decide which group a case 
actually belongs m ’ llubertv ( 1994) noted that the “decision or classification or assignment rule 
that is commonly used is based on the tuaxtmwn likelihood principle. Assign a unit to the 
population in which its observation vector has the greatest likelihood of occurrence " (p 43) 

In discriminant analysis it is possible to graph the function scores for each individual 
subject onto a P dimensional space, where P refers to the number of functions that are calculated 
Klecka. 1980) Since one ol the conditions placed upon function scores is to maximize between 
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group \ ariance while minimizing within group \ariance. each group's members will tend to 
cluster about the group centroid. Conceptually, a subject is classified based upon their position 
in the P dimensional space, with assignment going to the group who.se centroid is the closest 
distance from that particular subject's discriminant score vector. 

When a subject is classified into the closest group based upon this distance, this 
assignment is also implicitly based upon assigning it to the group for which it has the highest 
probability of belonging ( Klecka. 1980). One probability that can be calculated is a "typicality 
probabilih " (Hubort\-, 1994). SP.S.S DISCRIMINANT produces a "typicality probability" table 
denoted by P(D. G). which ret'crs to the probability of having the discriminant score vector gi\ en 
membership in the stated group. Klecka ( 1 980 ) describes a "t\ picality probability" as the chance 
that a case that far from the group centroid could actually belong to that group. A small typicality 
probabiliu implies a greater distance of the discriminant score \ector from the stated group 
centroid (Huberty & Wisenbaker, 1992). For example, in Table 7 case 191 has a 3 1.10° b chance 
of coming from it's stated group membership of group 3. Case 3 on the other hand, has a 97. 10°o 
chance of coming from it's stated group. 4. l luberU' and Wisenbaker ( 1 992) note that an object 
associated w ith a small typicality probability of less than .10 could be considered a possible 
outlier They also suggest possible ways to deal with potential outliers. 

Insert Table 7 about here 

Another type of probabilit\ that is calculated is a "posterior probability." denoted by 
P(G D). which refers to the probability of belonging to any group. gi\en a particular score vector 
(Huberty. 1994). liach subject is gi\en a set of "posterior probabilities." one posterior probability 
for each group. By definition these sets or"poslerior probabilities" must sum to 1 .00 (Hubertv. 
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1 ^^4; Klecka. 1 980 ). The reason the "posterior probabilities" sum to 1 .00 across groups for each 
subject can be illustrated with the following extreme case. It is possible that any subject could 
hav e 100°o chance of belonging to group 1. This would mean that by definition this subject 
would have a 0“ b chance of belonging to groups 2. 3. or 4. A subject is assigned to the group 
which has the highest probability of belonging. Again, classification on the largest of these 
values also is equivalent to using the smallest distance (Klecka. 1980 ). "Posterior probabilities" 
can be calculated for each group, but SPSS reports only the two highest values for each subject. 

It is often clear which group a case should be assigned to based upon the typicality 
probabilities or posterior probabilities. For example, it is clear based upon the posterior 
probabilities that case number 191 "belongs" in group 4. Howev er, it may not be readilv 
apparent which group some cases belong. For example, cases 1 . 2, 3, 190 and 192 all have 
relatively similar close "posterior probabilities." The data used in our study could be considered 
to have a low level of discrimination, therefore, group membership may not be "neatly" 
concluded. When this is the case, the subjects are likely to have similar probabilities for each 
group. Klecka ( 1980), encourages researchers to be cautious about decisions surrounding these 
types of cases, especially when there is evidence that the assumption of multivariate normality 
has not been met. 

The number of cases correctly predicted by the classification functions is called the hit 
rate, the total focus of PDA. The higher the hit rate, the better the functions predict group 
membership. Also included in Table 7 are the classification results. In this particular study, 
roughly 44. 13°o of the cases were correctly classified based up the functions derived from our 
sample. While it would be desirable to have a higher hit rate, with our classification functions 
wc can predict better than chance (2.'^'’o) what the group membership was. An example of a poor 
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hit rate would be a PDA with only two groups and a classification result of 50“b. By chance 
alone, we would have a 50“ o probability of predicting group membership correctly. Therefore, a 
hit rate of 50“o using predictive information results in no improvement over prediction using no 
information. 

The classification functions that were derived in the present paper were based upon an 
equal probability of being assigned in a particular group. If we had prior knowledge that a 
particular group had 70“ o of the cases, and the remaining three groups had 10“b each, we would 
want the e\ idence to be strong that a member assigned to the smaller groups actually belonged 
there. This can be accomplished by adjusting the posterior probabilities by taking into account 
these prior probabilities ( Klecka, 1680). 

Another instance in w'hich the prior probabilities should be taken into consideration is 
when the study in\olves relatively high stakes. Klecka ( 1980) refers to this as the cost of 
misclassiflcation. His example pertains to the determination of whether a patient has malignant 
or benign cancer. The cost of misclassifk ing a person with a malignant cancer into the benign 
cancer group is readily apparent. The researcher would want the cwidence to be overwhelming 
that cases actually belong to the benign group before they are classified. This added confidence 
in the classification can be accomplished by adjusting for prior probabilities (Klecka, 1980). 

InteiTial \s. External Hit Rates 

A shortcoming of our present data is that the typicality probabilities printed by the SPSS 
DISCRIMINAN r program are based on an "internal analysis" (lluberty & Wisenbaker. 1992). 
This method, the most common method used in the behavioral science, uses the data to 
formulate a classification function and then classifies the same data with the obtained rule 
(lluberty, Wisenbaker, & Smith, 1987). I'his so-called "apparent hit rate," typically yields 
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classifications results better than a "true hit rate". A “true hit" rate refers to the classification of a 
future sample based upon an empirically derived rule or function. The reasoning behind the 
positive bias of an "apparent hit rate" is analogous to the maximization of in regression. Since 
the weights are obtained by optimizing the variance of the sample at hand, sampling error 
idiosyncrasies in the data will influence positively the internal hit rate (Huberty, 1994). 

Another method for identifying the "true hit rate" would be an "external analysis" such 
as a "holdout method" or a "leave-one-out method” (Huberty & Wisenbaker, 1992). One way of 
carry ing out an external classification is to randomly split the available data into two smaller 
samples. With one of the sub-samples, calculate a classification function and then use the 
discriminant functions to predict the membership of the other sub-sample. Typically, one sub- 
sample is larger and the larger sub-sample is used to derive the classification function. The "true 
hit rate” is determined by classifying the sub-sample that has been left out. Huberty, Wisenbaker 
and Smith ( 1987) have called this external classification method the “holdout method." since 
part of the sample has been held out. 

Another method of calculating an external classification function is called the "leave- 
one- out method” (L-0-0) (Huberty, Wisenbaker & Smith, 1987). This method in\olves deleting 
one subject and determining a linear classification function based upon the remaining N-l 
subjects. These linear classification functions are used to classify the deleted unit into one of the 
groups. This process is carried out N amount of times (Huberty, Wisenbaker & Smith. 1987). 

There are limitations to these alternate ways of calculating hit rates. For further 
information on the draw-backs and benefits of calculating these two types of external hit rates 
the reader is directed to Huberty. Wisenbaker and Smith ( 1 987). The detailed presentation of 
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these methods of hit rate calculation are beyond the scope of this work, and not because these 
methods are not important. 

A Final and Important Distinction Between DDA and PDA 

Generally, the adding of variables to a statistical analysis does not take away from effect 
size, and often increases uncouected effect sizes. This is also true for DDA (Huberty, 1994). 
However, in PDA. fewer variables can yield greater classification accuracy, whereas in DDA. 
fewer variables cannot yield greater discrimination (Huberty, 1994). Thompson ( 1995) stresses 
that this is an important point and that this apparent parado.x emphasizes the importance in 
distinguishing DDA from PDA. 

One option that is available on statistical packages such as ,SPS,S is the plotting of 
territorial maps (Thompson. 1995). These plots indicate the boundaries of the groups and include 
notations as to the location of each subject in the variable space. Some subjects may be close to 
the group centroids of the groups on these territorial maps, while other subjects may be "fence- 
riders" or lie just within the boundaries of a particular territon*'. The paradoxical effect happens 
because the subjects, in the data set with more variables, will always move on the averaue closer 
to their respective group centroids, which results in a decreased Wilks' lambda (increasing the 
effect size). However, some subjects could move only slightly further from their group centroid 
into a wrong group. F'or example, when a variable is added, a given subject who was originally a 
correctly-classified "fence rider" could move considerably closer to its respective group centroid 
while three other subjects who were initially correctly classified but also "fence riders" could 
move a very small distance into the wrong group upon the addition of new predictor variables. 
The net result is an increase in effect size but the undesirable effect of a decrease in the hit rate 



(Thompson. 1995). 
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Conclusion 

Discriminant analysis techniques are being widely used in educational research (Huberty 
& Barton, 1989). The present paper was not intended to be an exhaustive survey of discriminant 
analysis, but rather, has attempted to familiarize the reader with the important information that 
may be encountered when trying to read and understand research articles that have used a 
discriminant analysis. Emphasis w as also placed on the reading and understanding of computer 
generated printouts. 

It is hoped that the reader at this point has an understanding of the differences between 
(PDA) and (DDA). Also, the reader has been encouraged to understand how to detect \ iolations 
m the assumptions of discriminant analysis, how to evaluate the importance of the omnibus null 
hypothesize, how to calculate the effect size, how to distinguish between the structure matrix and 
canonical discriminant function coeflT.' ent matrix, how to evaluate which groups differ, and the 
importance of hit rates in predicti\e discriminant analysis. 
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Table 1 



SPSS Printout: Univariate Homoaeneirv of Variance Tests 



Variable ..MATH 
Cochrans C(44.4) - 
Bartlett-Box F(3,37496) 

Variable .^SPELLING 
Cochrans C(44.4) = 
Bartlett-Box F(3,37496) ■= 

Variable ..READING COMP 
Cochrans C(44,4) = 
Bartlett-Box F( 3,37496) = 



.34039, P ^ . 123 (approx. ) 
1.83860, P = .138 ' 



.28700, P = .828 (approx ) 
.21474, P = .886 



.27727. P ^ 1.000 (approx.) 
.27527. P ■•= .843 



Variable ..PROBLEM SOLVING 

Cochrans C(-I4.4) = .28839, P = .797 (approx. ) 

Bartlett-Box F(3,37496) - .64890. P --- .584 



O 

ERIC 
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Table 2 



SPSS Printout: Variance. Covariance Matrix for Each Group and Statistical SiLmificance Test for 
Homo^eneiU’ of Variance, Covariance Matrices 



Cell Number .. 1 
Variance-Covariance matrix 



MATH 

SPELLING 


MATH 

89.736 

-5.471 


SPELLING 
1 18.943 


READING COMP 


PROBLEM SOLV 


READING COMP 


-9.107 


-60.786 


74.593 




PROBLEM SOLV 


-55.476 


-8.914 


28.667 


107.168 



Determinant of Covariance matrix of dependent variables - 29003630.49329 
LOG( Determinant) = 1 7. 1 8293 



Cell Number .. 2 
Variance-Covariance matrix 

MATH SPELLING READING COMP PROBLEM SOLV 
MATH 52.720 

SPELLING 4.760 125.378 

READING COMP -7.200 -39.969 58.685 

PROBLEM SOLV -50.560 -3.000 5.400 97.680 

Determinant of Covariance matrix of dependent variables - 146755 14.96235 
LOG(Determinant) 16.50169 



Cell Number .. 3 
Variance-Covariance matrix 





MATH 


SPELLING 


READING COMP 


PROBLEM SOLV 


MATH 


72.349 








SPELLING 


-4.635 


153.606 






READING COMP 


19.794 


-59.822 


75.171 




PROBLEM SOLV 


-58.454 


1.025 


-23.1 17 


92.330 


Determinant of Covariance matrix of dependent \ 


ariables = 22943989. 


27570 


LOG( Determinant) 






16.94857 





O 

ERIC 
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Table 2 Continued 



Cell Number . 4 
Variance-Covariance matrix 



MATH SPELLING READING COMP PROBLEM SOLV 
48.819 

-3.956 137.283 

-3.005 -56.417 62.661 

-25.265 5.995 15.389 74.436 

Determinant of Covariance matrix of dependent variables = 14386655.81828 
LOG(Determinant) ~ 16.48181 



MATH 
SPELLING 
READING COMP 
PROBLEM SOLV 



Pooled vvithin-cells Variance-Covariance matrix: 





MATHS PEI 


TING READING COMP 


PROBLEM SOLV 


MATH 


62.266 








SPELLING T 


-3.150 


135.179 






READING COMP 


-.265 


-55.622 


66.981 




PROBLEM SOLV 


-41.559 


00.734 


8.916 


87.882 



Determinant of pooled Covariance matrix of dependent vars. = 21708953.83377 
LOG( Determinant ) = 16.89324 



Multivariate test for Homoeeneitv of Dispersion matrices 

Bo.x'sM= 30.62654 

F WITH (30.35928) DF = .96934. P - .5 1 3 (Approx. ) 

Chi-Square with 30 DF ^ 29.10579. P - .512 (Approx.) 



O 

ERIC 
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Table 3 

SPSS Printout: Multivariate Test of Statistical Siunitlcance (Omnibus Null land 
Univariate Tests of Statistical Sianitlcancc 



Analysis of Variance-- design 1 



EFFECT .. TEACHING METHOD 
Multivariate Tests of Signiiicance (S ^3. M 0. N - 85 ) 



Test Name 


Value Approx. F Hypolh. DF 


lirror DF 


Sig. of 


Pillais 


.14825 2.26145 


12.00 


522.00 


.009 


Hotel lings 


17023 2.42100 


12.00 


512.00 


005 


**Wilks 


.85325 2.34585 


12.00 


455.36 


.006** 


Rovs 


.13732 









EFFECT .. TEACHING METHOD (Cont.) 
Univariate F-tests with (3.175) D. F. 



Vanable 


Flypoth SS 


lirror SS 


Hypoth MS 


L.rror MS 


r Sig 


of F 


MATH 


319.908 


10896 528 


106 635 


62 266 


1.71259 


166 


SPELLING 


1725.308 


23656 301 


575.102 


135 179 


4.25438 


006 


READINC; COMP 1455 55 


1 1721.751 


485.185 


66 98 1 


7.24358 


000 


PROBLEM SOLV 215.65 


1 5379 333 


71 883 


87 882 


8 1 795 


486 



ERIC 
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Table 4 



■ SPSS Printout. Canonical Discriminant Functions 



Pet of Cum Canonical After Wilks' 



!'cn l:iuen\alue Variance Pet Corr 



1* 15P2 43 51 43 51 3706 

* 0107 6.28 44 7Q 1028 

* 0004 21 100.00 0190 



Fen Lambda Chi-square df Siu 

0 .853250 27.614 12 .0063 

1 484071 1412 6 4276 

2 494637 063 2 .9689 



* Marks the 3 canonical discriminant functions remammu in the anaKsis 




■'h 
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Table 5 

SPSS Printout: Canonical discriminant functions evaluated at uroup means (izroup centroids’) 



Group Func 1 

A -.34550 
B -.38590 
C -.35163 
D .43371 



Func 2 


Func 


00484 


-.03372 


-.20331 


.01855 


.14844 


.01947 


-.00287 


00038 



ERIC 
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Table 6 



SPSS Printout: St a ndardized Canonical Discriminant Function Coefficients 
and Structure Matrix: 



Standardized canonical discriminant function coefficients: 





Func 1 


Func 2 


Func 3 


MATH 


.23501 


1.18908 


-.00699 


SPELLING 


20192 


.04661 


.26590 


READING COMP 


.79409 


-.22643 


.49135 


PROBLEM SOLV 


-.25202 


.75391 


.85607 



Structure matrix: 

Pooled within-groups correlations between discriminating variables 
and canonical discriminant functions 
(Variables ordered by size of correlation within function) 





Func 1 


Func 2 


Func 3 


READING COMP 


.88187 * 


-.17094 


.43544 


SPELLING 


.67587 * 


.14323 


-.01530 


MATH 


.38027 


.76486 * 


-.49908 


PROBLEM SOLV 


-.29313 05987 




91889 * 



* denotes largest absolute correlation between each \ ariable and any 
discriminant function. 






o 
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Table 7 



SPSS Printout: rypicalitv Probability and Hit Rates (Classillcation Results): 



Case 


Actual 


Highest Probability 2nd H 


ighest Discrim 


Number Group 


Group 


P(D/G)P(G/D) Group P(G.'D) Scores 


1 


IJNGRPD 


3 .9390 


3019 1 


.2893 -.7569 


2 




1 .9884 


.2550 3 


.4791 
-.3443 
.2492 .0050 


3 


4 


4 .9710 


.2536 3 


-.0579 

-.0361 

.2504 .0469 


190 


4 ** 


2 .9412 


.2665 4 


-.0169 

.2994 

.2628 .1201 


191 


3 ** 


4 .3110 


.7619 3 


-.5716 

-.0414 

.1650 1.1578 


192 


1 


1 .1577 


.3235 2 


-.3814 

-.2750 

.3134 -1.7310 


Classification results 
Actual Group 


Cases 


No. of 
1 


-.1416 

-1.8392 

Predicted Group Membership 
2 3 


Group 1 


36 


3 


10 


13 


Group 2 


26 


8.3% 

2 


27.8% 

12 


36.l°o 

6 


Group 3 


36 


7 7° o 
6 


46. 2" 0 
7 


23.1°o 

12 


Group 4 


81 


16.7% 

1 


19.4°(, 

13 


33.3°o 

15 


Ungrouped cases 1 


> 


1.2‘’o 

2 


1 6.0° 0 
3 


18.5°o 

1 



15.4°o 23.1% 7.7% 



4 

10 

27.8% 

6 



30.6°o 

52 

64.2% 

7 

53.8% 



Percent of "grouped" cases correctly elassitled: 44. 13°o. 
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Appendix 1 



SPSS Syntax File For MANOVA and DISCRIMINANT proarams. 

MANOVA 

math probsolv readcomp spelling BY method( 1 4) 

/DISCRIM RAW STAN ESTIm'^CORR ROTATE(VARIMAX) ALPHA( 1 ) 

/PRINT SIGNIF(MULT UNIV EIGN ) SIGNIF(EFSIZE) CELLINFO(CORR) 
CELLINFO(COV) 

HOMOGENEITY! BARTLETT COCITRAN BOXM) 

/TMOPRINT PARAM(ESTIM) 

/METHOD-UNIQUE 
/ERROR WITHIN+RESIDUAL 
/DESIGN. 

DISCRIMINANT 
,GROUPS=method(l 4) 

/VARIABLES=math probsolv readcomp spelling 
/ANALYSIS ALL 
/PRIORS EQUAL 

/STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW CORR COV GCOV TCOV 
TABLE 

/PLOT=CASES 

/CLASSIFY=NONMISSING POOLED. 




3k! 






